Introduction to R/RStudio

Dr. Robinson

Follow along with me!

Visit https://posit.cloud/content/5458223 to access the workshop material.1

Note

If you do not yet have a posit Cloud2 (previously known as Rstudio Cloud) account, you will have to sign up!

Outline

  • Intro to R/RStudio
  • Packages
  • Create a graph with me!
  • Continuing with R

What is R? Why R?

R is a programming language designed originally for statistical analyses.

Strengths

  • handle data with a lot of different types of variables.
  • make nice and complex data visualizations.
  • have cutting-edge statistical methods available to users.
  • automate reporting with R Markdown / Quarto documents.

Weaknesses

  • performing non-analysis programming tasks, like website creation.
  • hyper-efficient numerical computation.
  • being a simple tool for all audiences.

R vs RStudio

RStudio Integrated Development Environment (IDE)

An RStudio window is by default divided into 4 panes, each of which may contain several tabs. You can reconfigure the locations of these tabs based on your preferences by selecting the toolbar button with 4 squares (just left of the Addins dropdown menu).

R Scripts

The logo on the script file indicates the file type. When an R file is open, there are Run and Source buttons on the top which allow you to run selected lines of code (Run) or source (run) the entire file. Code line numbers are provided on the left (this is a handy way to see where in the code the errors occur), and you can see line:character numbers at the bottom left. At the bottom right, there is another indicator of what type of file Rstudio thinks this is.

Quarto Documents

The logo on the script file indicates the file type. When a quarto markdown file is open, there is a render button at the top which allows you to compile the file to see its “pretty”, non-markup form. In the same toolbar, there are buttons to add a code chunk as well as to run a selcted line of code or chunk of code. You can toggle between source (shown) and visual mode to see a more word-like rendering of the quarto markdown file. Code line numbers are provided on the left (this is a handy way to see where in the code the errors occur), and you can see line:character numbers at the bottom left. At the bottom right, there is another indicator of what type of file Rstudio thinks this is.

Variable Assignment <-

We assign objects in R using the syntax object_name <- value

message <- "So long and thanks for all the fish"
year <- 2025
the_answer <- 42L
earth_demolished <- FALSE

Note

This is analogous to object type. We can assign names to variables, vectors, matrices, dataframes, graphs, statistical models, etc. with <-.

Base R & Function Arguments

R Core Group

R was formally released by the R Core Group in 1997: https://www.r-project.org/contributors.html

This group of 20-ish volunteers are the only people who can change the base (built-in) functionality of R.

  • Base functions are a set of functions in the R programming language that are included in the base package.

  • These functions provide a wide range of functionality, including mathematical operations, statistical functions, data manipulation, and input/output operations.

vec <- seq(from = 1, to = 10, by = 2)
vec
[1] 1 3 5 7 9
mean(x = vec)
[1] 5
new_vec <- vec*2
new_vec
[1]  2  6 10 14 18

Packages

install.packages("tidyverse")
library(tidyverse)

Image by Michela Cameletti

Create a graph with me!

Graphics in R

A fuzzy monster in a beret and scarf, critiquing their own column graph on a canvas in front of them while other assistant monsters (also in berets) carry over boxes full of elements that can be used to customize a graph (like themes and geometric shapes). In the background is a wall with framed data visualizations. Stylized text reads “ggplot2: build a data masterpiece.” Learn more about ggplot2.

Palmer Penguins Data

library(palmerpenguins)
data(penguins)
penguins
# A tibble: 344 × 8
   species island    bill_length_mm bill_depth_mm flipper_…¹ body_…² sex    year
   <fct>   <fct>              <dbl>         <dbl>      <int>   <int> <fct> <int>
 1 Adelie  Torgersen           39.1          18.7        181    3750 male   2007
 2 Adelie  Torgersen           39.5          17.4        186    3800 fema…  2007
 3 Adelie  Torgersen           40.3          18          195    3250 fema…  2007
 4 Adelie  Torgersen           NA            NA           NA      NA <NA>   2007
 5 Adelie  Torgersen           36.7          19.3        193    3450 fema…  2007
 6 Adelie  Torgersen           39.3          20.6        190    3650 male   2007
 7 Adelie  Torgersen           38.9          17.8        181    3625 fema…  2007
 8 Adelie  Torgersen           39.2          19.6        195    4675 male   2007
 9 Adelie  Torgersen           34.1          18.1        193    3475 <NA>   2007
10 Adelie  Torgersen           42            20.2        190    4250 <NA>   2007
# … with 334 more rows, and abbreviated variable names ¹​flipper_length_mm,
#   ²​body_mass_g

Learn more about the data!

Learn more about the Palmer Penguins data set here.

Start with an empty canvas

library(ggplot2)
ggplot(data = penguins)

Add \(x\) and \(y\) aesthetics

ggplot(data = penguins,
       mapping = aes(x = bill_length_mm,
                     y = bill_depth_mm
                     )
       )

Add a geometric layer

ggplot(data = penguins,
       mapping = aes(x = bill_length_mm,
                     y = bill_depth_mm
                     )
       ) +
  geom_point()

Fix the axis labels and add a title

ggplot(data = penguins,
       mapping = aes(x = bill_length_mm,
                     y = bill_depth_mm
                     )
       ) +
  geom_point() +
  labs(x = "Bill Length (mm)",
       y = "Bill depth (mm)",
       title = "Penguin Bill Size"
       )

Add another layer – line of best fit

ggplot(data = penguins,
       mapping = aes(x = bill_length_mm,
                     y = bill_depth_mm
                     )
       ) +
  geom_point() +
  geom_smooth(method = "lm") +
  labs(x = "Bill Length (mm)",
       y = "Bill depth (mm)",
       title = "Penguin Bill Size"
       )

Okay, but that’s kind of boring and doesn’t look quite right…

Add color and shape to differentiate the species

ggplot(data = penguins,
       mapping = aes(x = bill_length_mm,
                     y = bill_depth_mm,
                     color = species,
                     shape = species
                     )
       ) +
  geom_point() +
  geom_smooth(method = "lm") +
  labs(x = "Bill Length (mm)",
       y = "Bill depth (mm)",
       title = "Penguin Bill Size",
       color = "Species",
       shape = "Species"
       )

Change the color scale and adjust the axis scales

ggplot(data = penguins,
       mapping = aes(x = bill_length_mm,
                     y = bill_depth_mm,
                     color = species,
                     shape = species
                     )
       ) +
  geom_point() +
  geom_smooth(method = "lm") +
  labs(x = "Bill Length (mm)",
       y = "Bill depth (mm)",
       title = "Penguin Bill Size",
       color = "Species",
       shape = "Species"
       ) +
  scale_color_brewer(palette = "Dark2") +
  scale_x_continuous(limits = c(30, 60), 
                     breaks = seq(30,60,10)
                     ) +
  scale_y_continuous(limits = c(10, 25), 
                     breaks = seq(10,25,5)
                     )

Pick a theme

ggplot(data = penguins,
       mapping = aes(x = bill_length_mm,
                     y = bill_depth_mm,
                     color = species,
                     shape = species
                     )
       ) +
  geom_point() +
  geom_smooth(method = "lm") +
  labs(x = "Bill Length (mm)",
       y = "Bill depth (mm)",
       title = "Penguin Bill Size",
       color = "Species",
       shape = "Species"
       ) +
  scale_color_brewer(palette = "Dark2") +
  scale_x_continuous(limits = c(30, 60), 
                     breaks = seq(30,60,10)
                     ) +
  scale_y_continuous(limits = c(10, 25), 
                     breaks = seq(10,25,5)
                     ) +
  theme_bw()

Facet to include island

ggplot(data = penguins,
       mapping = aes(x = bill_length_mm,
                     y = bill_depth_mm,
                     color = species,
                     shape = species
                     )
       ) +
  geom_point() +
  geom_smooth(method = "lm") +
  facet_wrap(~ island) +
  labs(x = "Bill Length (mm)",
       y = "Bill depth (mm)",
       title = "Penguin Bill Size",
       color = "Species",
       shape = "Species"
       ) +
  scale_color_brewer(palette = "Dark2") +
  scale_x_continuous(limits = c(30, 60), 
                     breaks = seq(30,60,10)
                     ) +
  scale_y_continuous(limits = c(10, 25), 
                     breaks = seq(10,25,5)
                     ) +
  theme_bw()

Final Graphic

Be creative with R!

And so much more!

Download R/RStudio

  1. Download and run the R installer for your operating system from CRAN:

    If you are on Windows, you should also install the Rtools4 package; this will ensure you get fewer warnings later when installing packages.

    More detailed instructions for Windows are available here

  2. Download and install the latest version of RStudio for your operating system.

Getting Help

  • In R, you can access help with a ? or help()
?mean
help(mean)

Welcome to the R Community!

Classes at Cal Poly

Classes in the Department of Statistics with a focus on learning R:

  • STAT 331/531: Introduction to Statistical Computing with R
  • STAT 431/541: Advanced Statistical Computing with R
  • STAT 551: Statistical Learning with R

Many other courses in the department make use of R software for the purpose of learning statistical concepts.

Questions?